🧪 PRACTICAL 5 – Clustering & Association Rules in RStudio


🔹 PART 1: CLUSTERING (Hierarchical Clustering)


✅ Step 1: Data Preparation
idx <- sample(1:dim(iris)[1], 40)
irisSample <- iris[idx,]
irisSample$Species <- NULL

👉 What it does:

Takes 40 random samples from iris dataset
Removes Species column (only numeric data used)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
✅ Step 2: Distance Calculation
dist_data <- dist(irisSample)

👉 Calculates distance between data points

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
✅ Step 3: Apply Clustering
hc <- hclust(dist_data, method = "average")

👉 Performs hierarchical clustering

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
✅ Step 4: Plot Dendrogram
plot(hc, hang = -1, labels = iris$Species[idx])
📊 Output:
Tree diagram (Dendrogram)
Shows clusters of similar flowers

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
🔹 PART 2: ASSOCIATION RULE MINING
⚠️ IMPORTANT FIX (Your previous error)

❌ Don’t use:

load("D:/titanic.raw.rdata")

✔️ Use this (ALWAYS WORKS):

data("Titanic")
titanic.raw <- as.data.frame(Titanic)
str(titanic.raw)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
✅ Step 1: Install Package
install.packages("arules")

👉 Select: India (Bengaluru)


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
✅ Step 2: Load Library
library(arules)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
✅ Step 3: Generate Basic Rules
rules <- apriori(titanic.raw)
inspect(rules)
📊 Output:
Rules like:
Class=3rd → Survived=No
Age=Child → Survived=Yes


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
🔹 PART 3: ADVANCED RULES (IMPORTANT ⭐)
rules <- apriori(titanic.raw,
parameter = list(minlen=2, supp=0.005, conf=0.8),
appearance = list(rhs=c("Survived=No", "Survived=Yes"), default="lhs"),
control = list(verbose=FALSE))

rules.sorted <- sort(rules, by="lift")
inspect(rules.sorted)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
🔹 PART 4: REMOVE REDUNDANT RULES

✅ Step 1: Identify Redundant Rules
subset.matrix <- is.subset(rules.sorted, rules.sorted)
subset.matrix[lower.tri(subset.matrix, diag=TRUE)] <- NA
redundant <- colSums(subset.matrix, na.rm=TRUE) >= 1
which(redundant)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
✅ Step 2: Remove Them
rules.pruned <- rules.sorted[!redundant]
inspect(rules.pruned)

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
📊 Final Output:
Clean rules (no duplicates)
Better understanding of relationships